Entropy analysis of word-length series of natural language texts: Effects of text language and genre
نویسندگان
چکیده
We estimate the n-gram entropies of natural language texts in word-length representation and find that these are sensitive to text language and genre. We attribute this sensitivity to changes in the probability distribution of the lengths of single words and emphasize the crucial role of the uniformity of probabilities of having words with length between five and ten. Furthermore, comparison with the entropies of shuffled data reveals the impact of word length correlations on the estimated n-gram entropies.
منابع مشابه
EFL Textbook Evaluation: An Analysis of Readability and Vocabulary Profiler of Four Corners Book Series
This study aimed to investigate whether there is any significant relationship between the readability and vocabulary profile including the most frequent words (K1 words) and academic word list (AWL) of reading passages of Four Corners series which were EFL textbooks. To determine the readability of the texts, the Flesch–Kincaid (1975) readability test was used, while the texts' academic word li...
متن کاملEFL Textbook Evaluation: An Analysis of Readability and Vocabulary Profiler of Four Corners Book Series
This study aimed to investigate whether there is any significant relationship between the readability and vocabulary profile including the most frequent words (K1 words) and academic word list (AWL) of reading passages of Four Corners series which were EFL textbooks. To determine the readability of the texts, the Flesch–Kincaid (1975) readability test was used, while the texts' academic word li...
متن کاملThe Intertextuality in an English as a Foreign Language Textbook: An Analytical Study of Interchange Fourth Edition
This study investigated the utilization of intertextuality in the fourth edition of the Interchange book series for English as Foreign Language (EFL) Learners using Fairclough’s (1992) framework. Ten texts were randomly chosen among the reading passages of the Interchange book series and later analyzed regarding intertextuality kinds and methods of reporting. Findings indicated that two types o...
متن کاملProducing a Persian Text Tokenizer Corpus Focusing on Its Computational Linguistics Considerations
The main task of the tokenization is to divide the sentences of the text into its constituent units and remove punctuation marks (dots, commas, etc.). Each unit is a continuous lexical or grammatical writing chain that is an independent semantic unit. Tokenization occurs at the word level and the extracted units can be used as input to other components such as stemmer. The requirement to create...
متن کاملThe Effect of Extensive Reading on Iranian EFL Learners’ Lexical Bundle Performance: a comparative study of adaptive and authentic texts
Formulaic language and sequence as the core characteristic of real-life language and native-like fluency, has been a subject of inquiry in recent decades. The aim of the present study is to investigate the effects of two extensive reading text types, i.e., adaptive and authentic, on Iranian EFL learners’ development of lexical bundles. To this aim, 20 intermediate EFL learners were chosen to pa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- I. J. Bifurcation and Chaos
دوره 22 شماره
صفحات -
تاریخ انتشار 2012